109 research outputs found

    Measuring Tie Strength in Implicit Social Networks

    Full text link
    Given a set of people and a set of events they attend, we address the problem of measuring connectedness or tie strength between each pair of persons given that attendance at mutual events gives an implicit social network between people. We take an axiomatic approach to this problem. Starting from a list of axioms that a measure of tie strength must satisfy, we characterize functions that satisfy all the axioms and show that there is a range of measures that satisfy this characterization. A measure of tie strength induces a ranking on the edges (and on the set of neighbors for every person). We show that for applications where the ranking, and not the absolute value of the tie strength, is the important thing about the measure, the axioms are equivalent to a natural partial order. Also, to settle on a particular measure, we must make a non-obvious decision about extending this partial order to a total order, and that this decision is best left to particular applications. We classify measures found in prior literature according to the axioms that they satisfy. In our experiments, we measure tie strength and the coverage of our axioms in several datasets. Also, for each dataset, we bound the maximum Kendall's Tau divergence (which measures the number of pairwise disagreements between two lists) between all measures that satisfy the axioms using the partial order. This informs us if particular datasets are well behaved where we do not have to worry about which measure to choose, or we have to be careful about the exact choice of measure we make.Comment: 10 page

    L2P: An Algorithm for Estimating Heavy-tailed Outcomes

    Full text link
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.Comment: 9 pages, 6 figures, 2 tables Nature of changes from previous version: 1. Added complexity analysis in Section 2.2 2. Datasets change 3. Added LambdaMART in the baseline methods, also a brief discussion on why LambdaMart failed in our problem. 4. Figure update

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction

    Full text link
    Link prediction is a crucial task in graph machine learning with diverse applications. We explore the interplay between node attributes and graph topology and demonstrate that incorporating pre-trained node attributes improves the generalization power of link prediction models. Our proposed method, UPNA (Unsupervised Pre-training of Node Attributes), solves the inductive link prediction problem by learning a function that takes a pair of node attributes and predicts the probability of an edge, as opposed to Graph Neural Networks (GNN), which can be prone to topological shortcuts in graphs with power-law degree distribution. In this manner, UPNA learns a significant part of the latent graph generation mechanism since the learned function can be used to add incoming nodes to a growing graph. By leveraging pre-trained node attributes, we overcome observational bias and make meaningful predictions about unobserved nodes, surpassing state-of-the-art performance (3X to 34X improvement on benchmark datasets). UPNA can be applied to various pairwise learning tasks and integrated with existing link prediction models to enhance their generalizability and bolster graph generative models.Comment: 17 pages, 6 figure
    • …
    corecore